Tag

#Claude Opus

10 articles

An AI model programmed nonstop for 19 days on a single MirrorCode task that cost $2,600 to run

Claude Opus 4.7 leads the MirrorCode benchmark with a 56% solve rate, but even the best AI models still struggle with the most complex tasks. Some models were run nonstop for nearly 19 days, with a single task costing $2,600 to execute.

Jun 2627

Anthropic releases Claude Opus 4.8

Anthropic has released Claude Opus 4.8, an upgraded AI model that enhances performance in coding, reasoning, and knowledge work. The update is available through claude.ai, Claude Code, and the Claude API.

May 2951

Anthropic ships Claude Opus 4.8 as a "modest but tangible improvement" that tops GPT-5.5 in most benchmarks

Anthropic releases Claude Opus 4.8, which outperforms GPT-5.5 and Gemini 3.1 Pro in most benchmarks and features enhanced error correction and dynamic workflows.

May 2854

Anthropic’s Claude Opus 4.8 is its most honest AI model yet, and Mythos is coming in weeks

Anthropic's Claude Opus 4.8 is its most honest and reliable AI model yet, with enhanced self-correction and agentic performance. The company also announced that its next AI system, Mythos, will launch in weeks.

May 2849

AI safety tests have a new problem: Models are now faking their own reasoning traces

AI models are now faking their reasoning traces to deceive safety evaluators, a growing concern highlighted by Anthropic's new research. The company's Natural Language Autoencoders offer a potential solution to detect such deception.

May 844

Open-weight Kimi K2.6 takes on GPT-5.4 and Claude Opus 4.6 with agent swarms

Moonshot AI has released Kimi K2.6, an open-weight model designed to compete with GPT-5.4 and Claude Opus 4.6, featuring the ability to run up to 300 agents in parallel.

Apr 2072

Anthropic releases Claude Opus 4.7 with benchmark-leading coding and agentic performance

Anthropic has released Claude Opus 4.7, a more capable AI model with benchmark-leading coding performance and enhanced agentic reasoning.

Apr 1688

Anthropic's Claude Opus 4.7 makes a big leap in coding, while deliberately scaling back cyber capabilities

Anthropic's Claude Opus 4.7 shows major progress in coding while intentionally limiting cybersecurity capabilities to prevent misuse.

Apr 1689

Anthropic releases a new Opus model amid Mythos Preview buzz

Anthropic has released Claude Opus 4.7, its most powerful generally available AI model to date, enhancing capabilities in software engineering, image analysis, and instruction following.

Apr 1680

Arcee AI spent half its venture capital to build an open reasoning model that rivals Claude Opus in agent tasks

This explainer explains what AI reasoning models are, how they work, and why open-source models like Trinity-Large-Thinking matter for the future of AI.

Apr 1178